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ABSTRACT:  Temperature-based  replica-exchange  molecular  dynamics  (REMD),  in  which  multiple  simultaneous  simulations,  or 
replicas,  are  run  at  a  range  of  temperatures,  has  become  increasingly  popular  for  exploring  the  energy  landscape  of  biomolecular 
systems.  The  practical  application  of  REMD  toward  systems  of  biomedical  interest  is  often  limited  by  the  rapidly  increasing  number 
of  replicas  needed  to  model  systems  of  larger  size.  Continuum  solvent  models,  which  replace  the  explicit  modeling  of  solvent 
molecules  with  a  mean-field  approximation  of  solvation,  decrease  system  size  and  correspondingly,  the  number  of  replicas,  but  can 
sometimes  produce  distortions  of  the  free  energy  landscape.  We  present  a  hybrid  implicit/explicit  solvent  REMD  method  in 
CHARMM  in  which  replicas  run  in  a  purely  explicit  solvent  regime  while  exchanges  are  implemented  with  a  high-density  GBMV2 
implicit  solvation  model.  Such  a  hybrid  approach  may  be  able  to  decrease  the  number  of  replicas  needed  to  model  larger  systems 
while  maintaining  the  accuracy  of  explicit  solvent  simulations.  Toward  that  end,  we  run  REMD  using  implicit  solvent,  explicit 
solvent,  and  our  hybrid  method,  on  three  model  systems:  alanine  dipeptide,  a  zwitterionic  tetra-peptide,  and  a  10-residue  /3-hairpin 
peptide.  We  compare  free  energy  landscape  in  each  system  derived  from  a  variety  of  metrics  including  dihedral  torsion  angles,  salt- 
bridge  distance,  and  folding  stability,  and  perform  clustering  to  characterize  the  resulting  structural  ensembles.  Our  results  identify 
discrepancies  in  the  free-energy  landscape  between  implicit  and  explicit  solvent  and  evaluate  the  capability  of  the  hybrid  approach  to 
decrease  the  number  of  replicas  needed  for  REMD  while  reproducing  the  energy  landscape  of  explicit  solvent  simulations. 


■  INTRODUCTION 

Molecular  dynamics  (MD)  simulations  of  proteins  can  pro¬ 
vide  deep  insight  into  the  microscopic  mechanisms  that  guide 
their  structure  and  function.  Accurate  modeling  of  such  systems 
typically  requires  an  atomistic  representation  of  both  the  protein 
molecule  and  thousands  of  individual  solvent  molecules  that 
make  up  its  local  aqueous  environment.  The  energy  landscape  of 
these  complex  systems  contain  innumerable  local  minima  which 
hinders  conformational  sampling  and  present  a  major  challenge 
to  accurate  and  efficient  modeling.  A  number  of  enhanced  sam¬ 
pling  methods  have  been  developed  to  address  this  issue.  These 
include  replica  exchange,1  accelerated  molecular  dynamics,2  self- 
guided  molecular  dynamics,3  potential  smoothing,4  locally  en¬ 
hanced  sampling,5  and  continuum  solvent  models. 

Temperature-based  replica-exchange  molecular  dynamics 
(REMD)  has  become  an  increasingly  popular  method  for  over¬ 
coming  these  challenges  in  protein  modeling  for  a  wide  range  of 
applications,  from  simulating  equilibrium  behavior  to  predicting 
structure.6  REMD  typically  involves  running  multiple  simulta¬ 
neous  simulations,  or  replicas,  at  a  wide-range  of  temperatures, 
while  allowing  exchanges  of  temperatures  between  these  replicas. 
The  exchanges  between  two  neighboring  temperature  replicas 
are  attempted  at  regular  intervals  and  made  when  the  Metropolis 
criterion  is  met.  This  criterion  relates  the  relative  probability 
of  finding  each  conformation  at  a  given  temperature  as  a  function 
of  their  respective  energies.  REMD  allows  conformations  to 
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effectively  percolate  up  and  down  the  ladder  of  temperatures. 
Higher  temperature  replicas  rapidly  overcome  potential  energy 
barriers  that  impede  sampling  while  lower  temperature  replicas 
sample  lower-energy  conformations  that  are  relevant  to  a  room- 
temperature  ensemble. 

Although  traditional  REMD  has  enjoyed  success  in  modeling 
peptides  and  small  proteins,  as  the  system  size  grows  larger, 
REMD  becomes  increasingly  impractical.  As  larger  proteins  are 
modeled,  larger  solvation  boxes  with  a  greater  number  of  solvent 
molecules  are  needed.  This  quickly  increases  the  size  and  com¬ 
plexity  of  the  system.  The  number  of  replicas  required  for  effi¬ 
cient  REMD  is  a  function  of  the  overlap  in  energy  distribution 
between  neighboring  replicas.  As  the  system  size  grows,  the 
difference  in  energy  between  a  given  temperature  range  in¬ 
creases,  and  a  larger  number  of  replicas  are  needed  to  span  that 
range.7,8  An  explicit  solvent  system  for  even  a  small  protein,  such 
as  the  20-residue  Trp-cage  mini-protein  requires  more  than  40 
replicas;9  for  pharmacologically  relevant  proteins  which  are  often 
several  hundred  residues  in  length,  potentially  hundreds  of 
replicas  may  be  required.  Alternative  replica-exchange  methods 
have  been  developed  that  decrease  the  necessary  number  of 
replicas  including  nonequilibrium  replica  exchange  and  biasing- 
potential  replica  exchange,11  they  are  generally  incompatible  with 


Received:  July  29,  2011 
Published:  December  01,  2011 


677 


dx.doi.org/l  0.1 021  /ct200529b  I J.  Chem.  Theory  Comput.  2012,  8,  677-687 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

01  DEC  2011 


2.  REPORT  TYPE 


3.  DATES  COVERED 

00-00-2011  to  00-00-2011 


4.  TITLE  AND  SUBTITLE 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 


5d.  PROIECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


Efficient  Conformational  Sampling  in  Explicit  Solvent  Using  a  Hybrid 
Replica  Exchange  Molecular  Dynamics  Method 

6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES)  8.  PERFORMING  ORGANIZATION 

U.S.  Army  Medical  Research  and  Materiel  Command, Biotechnology  report  number 

High  Performance  Computing  Software  Applications 

Institute, Telemedicine  and  Advanced  Technology  Research  Center, Fort 

Detrick, MD, 21702 

9.  SPONSORING/MONITORING  AGENCY  NAME(S )  AND  ADDRESS(ES )  10.  SPONSOR/MONITOR' S  ACRONYM(S) 

II.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

Temperature-based  replica-exchange  molecular  dynamics  (REMD),  in  which  multiple  simultaneous 
simulations,  or  replicas,  are  run  at  a  range  of  temperatures,  has  become  increasingly  popular  for  exploring 
the  energy  landscape  of  biomolecular  systems.  The  practical  application  of  REMD  toward  systems  of 
biomedical  interest  is  often  limited  by  the  rapidly  increasing  number  of  replicas  needed  to  model  systems  of 
larger  size.  Continuum  solvent  models,  which  replace  the  explicit  modeling  of  solvent  molecules  with  a 
mean-field  approximation  of  solvation,  decrease  system  size  and  correspondingly,  the  number  of  replicas, 
but  can  sometimes  produce  distortions  of  the  free  energy  landscape.  We  present  a  hybrid  implicit/explicit 
solvent  REMD  method  in  CHARMM  in  which  replicas  run  in  a  purely  explicit  solvent  regime  while 
exchanges  are  implemented  with  a  high-density  GBMV2  implicit  solvation  model.  Such  a  hybrid  approach 
may  be  able  to  decrease  the  number  of  replicas  needed  to  model  larger  systems  while  maintaining  the 
accuracy  of  explicit  solvent  simulations.  Toward  that  end,  we  run  REMD  using  implicit  solvent,  explicit 
solvent,  and  our  hybrid  method,  on  three  model  systems:  alanine  dipeptide,  a  zwitterionic  tetra- peptide, 
and  a  10-residue  &#946; -hairpin  peptide.  We  compare  free  energy  landscape  in  each  system  derived  from 
a  variety  of  metrics  including  dihedral  torsion  angles,  saltbridge  distance,  and  folding  stability,  and 
perform  clustering  to  characterize  the  resulting  structural  ensembles.  Our  results  identify  discrepancies  in 
the  free-energy  landscape  between  implicit  and  explicit  solvent  and  evaluate  the  capability  of  the  hybrid 
approach  to  decrease  the  number  of  replicas  needed  for  REMD  while  reproducing  the  energy  landscape  of 
explicit  solvent  simulations. 


15.  SUBIECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

1 9a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

li 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Journal  of  Chemical  Theory  and  Computation 


ARTICLE 


traditional  temperature-based  replica  exchange  of  protein-sized 
systems. 

In  explicit  solvent  systems,  the  energy  is  dominated  by  bulk 
solvent— solvent  interactions.  One  common  approach  to  making 
these  systems  more  tractable  for  simulation  is  by  replacing  the 
solvent  molecules  with  continuum  solvation  model  that  approx¬ 
imates  the  free  energy  associated  with  solvent— solvent  and 
solvent— protein  interactions.  The  generalized-Born  (GB)  im¬ 
plicit  solvent  model  is  an  accurate,  efficient,  approximation  of  the 
more  rigorous  but  computationally  expensive  Poisson— Boltzmann 
(PB)  continuum  electrostatic  model.12  In  a  typical  GBSA 
(Generalized-Born  Surface  Area)  implicit  solvation  model,  the 
polar  contribution  to  the  solvation  free  energy  is  calculated  using 
the  GB  approximation,  while  the  nonpolar  contribution,  also 
termed  the  “hydrophobic”  effect,  is  captured  by  a  surface  area 
term.  The  decreased  system  size  due  to  the  absence  of  sol¬ 
vent  molecules  significantly  reduces  complexity.  For  example,  a 
10-residue  peptide,  which  requires  24  replicas  in  explicit  solvent,13 
requires  only  8  replicas  using  a  continuum  solvation  model.14,15 
Trp-cage,  which  requires  over  40  replicas  in  explicit  solvent,9 
required  17  replicas  using  the  GB  implicit  solvation  model.16 

Although  implicit  solvent  models  allow  for  significantly  faster 
simulations,  there  are  a  number  of  reported  discrepancies  be¬ 
tween  results  from  implicit  solvent  simulations  and  the  explicit 
solvent  simulations  they  were  intended  to  reproduce.  For  a 
number  of  force  fields  including  CFIARMM22  and  Amber  ff09, 
simulations  of  alanine  dipeptide  show  that  GB  solvation  dis¬ 
proportionately  favors  a-helical  dihedral  conformations  over 
/3  or  PPII  conformations.17-19  Other  studies  have  reported  that 
GB  models  can  induce  overstabilized  salt-bridges.  '  Simula¬ 
tions  of  short  peptides  have  revealed  discrepancies  in  ensemble 
properties  such  as  end-to-end  distance  and  radius  of  gyration 
(Rg)  distributions.17,22  Zhou  and  Berne  showed  significant  dif¬ 
ferences  in  the  free  energy  landscape  of  the  C-terminal  /3-hairpin 
of  protein  G  derived  from  an  implicit  GB  model  and  explicit 
solvent.23  Finally,  Lee  and  Olson  shows  that  the  Trp-cage  mini¬ 
protein  has  three  distinct  stable  conformations  in  GB16  which  is 
not  observed  in  explicit  solvent.9 

The  discrepancies  in  simulation  behavior  between  implicit  and 
explicit  solvent  models  can  be  attributed  to  two  related  issues. 
First,  do  continuum  solvation  models,  in  which  there  is  a  discrete 
dielectric  boundary  separating  protein  and  solvent  volumes, 
accurately  reflect  the  solvation  free  energies  for  a  given  protein 
conformation?  Second,  does  the  surface  area  term  in  the  GBSA 
model  capture  the  complex  effects  of  explicit  solvent  molecules 
on  the  conformational  space  sampled  by  the  protein  in  an 
implicit  solvent  simulation?  We  hypothesize  that  differences  in 
conformational  sampling  are  primarily  responsible  for  the  dis¬ 
crepancies  in  implicit  solvent  simulations  and  that  sampling  in 
explicit  solvent  should  be  sufficient  to  reproduce  explicit  solvent 
simulation  results. 

Toward  that  end,  we  implement  a  hybrid  implicit/explicit 
solvation  model  for  REMD  in  CHARMM.“4  In  this  method,  each 
replica  is  run  exclusively  in  explicit  solvent  using  the  CFLARMM22 
force  field.  During  an  exchange  attempt,  the  solvent  molecules 
are  removed,  and  the  energy  of  the  reduced  system  is  calculated 
using  the  CHARMM22  force  field  with  the  GBMV2  implicit 
solvation  model.12  Once  an  exchange  has  been  completed,  the 
solvent  molecules  are  reinserted  into  the  system,  at  their  previous 
positions  and  velocities,  and  the  replica  resumes  its  course.  Since 
each  replica  is  run  in  explicit  solvent  in  its  entirety,  sampling  is 
done  exclusively  along  the  free  energy  landscape  of  the  explicit 


solvent  system.  Since  exchanges  are  calculated  using  an  implicit 
solvation  representation  of  the  system,  the  energy  distribution  of 
each  replica  is  roughly  similar  to  the  distribution  in  the  implicit 
solvent  system.  This  reduces  the  number  replicas  needed  to  span 
a  given  temperature  range.  Theoretically,  the  GB  solvation  free 
energy  calculation  should  approximate  the  free  energy  contribu¬ 
tion  of  all  solvent  molecules  over  all  solvent  molecule  degrees 
of  freedom.  Ultimately,  by  sampling  exclusively  in  the  explicit 
solvent  domain  while  calculating  energies  for  exchanges  using  an 
implicit  solvation  model,  we  aim  to  reduce  the  number  of  replicas 
needed  for  efficient  REMD  while  accurately  reproducing  the 
results  of  the  more  computational  expensive  explicit  REMD 
simulation. 

We  apply  our  method  to  three  model  systems  to  measure  its 
efficiency  and  effectiveness  in  reproducing  results  derived  from 
explicit  REMD  simulations.  These  systems  include:  blocked 
alanine  dipeptide  (Ace-A-Nme),  zwitterionic  tetrapeptide  Ace- 
RAAE-Nme),  and  chignolin,  a  10-residue  /1-hairpin  peptide 
(GYDPETGTWG).  All  of  these  systems  have  been  modeled  in 
previous  MD  simulation  studies  and  each  has  shown  different 
behavior  in  implicit  and  explicit  solvation  models.1^17  20  For 
each  system,  we  carry  out  a  fully  implicit  REMD,  a  hybrid  REMD, 
and  a  fully  explicit  REMD  simulation  and  calculate  the  resulting 
energy  landscape  in  terms  of  dihedral  torsion  angle,  salt-bridge 
stability,  and  peptide  folding  stability.  Finally,  we  identify  dis¬ 
crepancies  in  the  energy  landscapes  from  the  three  solvent 
models  and  evaluate  the  ability  of  the  hybrid  method  in  reprodu¬ 
cing  explicit  solvent  results  at  a  fraction  of  the  computational  cost. 

There  have  been  two  prior  implementations  of  hybrid  implicit/ 
explicit  solvation  in  REMD.  The  first  implementation,  by 
Okur  et  al.,25  used  the  GB  implicit  solvation  model  from  Amber 
to  calculate  the  solvation  energy  of  a  reduced  representation  of 
the  system  that  included  at  least  two  shells  of  solvent  molecules 
during  exchanges.  They  modeled  alanine  polypeptides  of  various 
lengths  and  found  that,  for  small  systems,  there  were  significant 
efficiency  gains  when  using  the  hybrid  approach.  However,  the 
necessity  of  including  multiple  layers  of  solvent  molecules  in 
their  method  limits  the  efficiency  gains  and  introduces  nontrivial 
technical  challenges  in  the  application  to  larger  protein  systems. 
In  another  implementation  of  the  hybrid  implicit/ explicit  solva¬ 
tion  in  REMD,  Mu  et  al.22  used  a  reparameterized  version  of  the 
computationally  expensive  PB  solvation  model  to  circumvent 
the  need  to  include  multiple  layers  of  solvent  molecules  dur¬ 
ing  exchanges.  They  applied  their  method  to  a  range  of  systems 
including  small  peptides  and  the  Trp-cage  mini-protein  and 
showed  results  comparable  with  explicit  REMD.13,22,26  In  our 
implementation  of  the  hybrid  REMD  method,  we  apply  the 
GBMV212  implicit  solvation  model  to  leverage  the  computa¬ 
tional  speed  and  accuracy  of  GB  with  the  simplicity  of  excluding 
all  solvent  molecules  in  the  energy  calculation.  Since  GB,  unlike 
PB,  is  compatible  with  implicit  solvent  simulations,  we  will  be 
able  to  evaluate  the  accuracy  of  the  hybrid  method  vs  the  GB 
method  compared  to  an  explicit  solvent  benchmark. 

■  METHODS 

Temperature-Based  Replica  Exchange.  The  replica- 
exchange  protocol  involves  running  a  number  of  independent 
simulations,  or  replicas,  at  a  range  of  temperatures  that  periodi¬ 
cally  exchange  temperatures.  Each  replica,  a,  exchanges  its 
temperature  with  another  replica,  b,  if  0  or  exp(— A„(,) 

is  greater  than  a  random  number  generated  uniformly  between 
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Table  1.  Summary  of  REMD  Simulations 


system 

method 

temperature  range  (K) 

length  (ns) 

no.  of  clients 

-^water 

box  size  (A) 

CPU-hr/ns 

Ace-A-Nme 

implicit 

298-  500 

10 

4 

10 

hybrid 

298-500 

10 

4 

693 

27.53 

94 

explicit 

298-500 

10 

24 

693 

27.53 

550 

Ace-RAAE-Nme 

implicit 

298-500 

40 

S 

17 

hybrid 

298-500 

40 

S 

1324 

34.09 

200 

explicit 

298-500 

40 

24 

1324 

34.09 

990 

GYDPETGTGWG 

implicit 

250-450 

100 

7 

39 

hybrid 

250-450 

100 

7 

1180 

32.92 

270 

explicit 

250-450 

100 

32 

1180 

32.92 

1200 

0  and  1,  where  Aaj,  is  defined  as 


(Eb-Ea) 


(1) 


where  Ea  is  the  energy  of  replica  a  corresponding  to  temperature 
Ta  and  fcB  is  Boltzmann’s  constant.  In  practice,  exchanges  are  only 
attempted  between  replicas  in  adjacent  temperature  windows. 

Simulation  Setup.  We  studied  the  following  peptide  systems 
in  the  present  work:  blocked  alanine  dipeptide  (Ace-A-Nme), 
zwitterionic  tetrapeptide  Ace-RAAE-Nme),  and  chignolin,  a 
10-residue  /3-hairpin  peptide  (GYDPETGTWG).  These  systems 
were  selected  to  compare  the  three  REMD  methods:  implicit 
REMD  (imp-REMD),  hybrid  REMD  (hyb-REMD),  and  explicit 
REMD  (exp-REMD),  over  a  diverse  range  of  ensemble  measure¬ 
ments  as  well  as  in  protein  folding  and  structure  prediction. 
Details  of  the  MD  methods  used  in  this  work  are  as  follows. 

We  performed  MD  simulations  with  the  CHARMM 
program,24  version  c35b3.  We  used  the  CHARMM22  force  field 
with  the  CMAP  backbone  dihedral  cross-term  extension.27  We 
set  an  integration  time  step  of  2  fs  and  applied  the  SHAKE28 
algorithm  to  fix  all  covalent  bonds  with  hydrogen  atoms.  Non- 
bonded  electrostatics  and  van  der  Waals  interactions  were  trun¬ 
cated  smoothly  from  12  to  14  A.  For  implicit  solvent  simulations, 
we  used  Langevin  dynamics  with  a  friction  coefficient,  y,  set  to 
10  ps  1  for  all  heavy  atoms.  For  implicit  solvent  simulations,  we 
used  the  GBMV2  model  with  default  parameters  except  for  the 
MMTSB  option  gbmvaig  which  was  set  to  97,  which  greatly 
increases  the  grid  resolution  and  consequently  the  accuracy  of  the 
GB  calculation  during  exchange  attempts  in  the  hybrid  simulation.  A 
surface  tension  value  of  0.00542  kcal/ (mol- A2)  was  used  for  the 
solvent  accessible  surface  area  nonpolar  solvation  term.  For 
hybrid  and  explicit  solvent  simulations  we  used  a  Nose-Hoover 
thermostat  with  a  temperature  coupling  constant  of  50  kcal/s2. 

Replica  exchange  was  performed  using  a  modification  of  the 
MMTSB  toolset29  script  aarex.pl.  Analysis  of  replica-exchange 
results  was  facilitated  by  a  modified  version  of  the  MMTSB  script 
rexanalysis.pl,  which  automates  WHAM  analysis  of  the  REMD 
simulation  data.  The  temperature  ranges,  simulation  lengths,  sys¬ 
tem  sizes,  and  dimensions  are  listed  in  Table  1.  The  number  of 
replicas  used  varied  from  system  to  system,  depending  on  its  size. 
For  alanine  dipeptide,  zwitterionic  tetrapeptide,  pentapeptide, 
and  /3-hairpin  peptide,  the  number  of  replicas  four,  five,  five,  and 
seven,  for  implicit  solvent  and  hybrid  simulations  and  24,  24,  24, 
and  32,  for  explicit  solvent  simulations.  In  the  hybrid  and  explicit 
solvent  simulations,  peptides  were  solvated  in  a  cubic  box  with 
693,  1324,  1103,  and  1180  TIP3P  water  molecules  respectively, 
and  the  minimal  number  of  sodium  or  chloride  counterions  were 


added  to  neutralize  the  system.  We  ran  simulations  to  lengths 
described  in  Table  1,  and  exchanges  between  replicas  were 
attempted  once  every  1  ps.  In  implicit  solvent  simulations,  all 
structures  were  equilibrated  for  4  ps  prior  to  the  simulation.  For 
the  hybrid  and  explicit  solvent  simulations,  we  equilibrated  the 
solvated  systems  by  running  5  ps  simulations  at  50, 100,  150,  200, 
250,  275,  and  298  K,  at  a  constant  pressure  of  1  atm.  The  average 
cubic  box  dimensions  from  the  final  equilibration  run  were  used 
for  the  constant  volume  simulations  during  replica  exchange.  The 
starting  structure  for  all  peptides,  except  chignolin  was  an  ex¬ 
tended  conformation,  with  cp  =  —150°  and  ip  =  150°.  The  start¬ 
ing  structure  for  chignolin  was  the  first  conformer  in  the  NMR 
structure  (PDB  code  1UAO).30 

Hybrid  Replica  Exchange.  In  traditional  implicit  or  explicit 
solvent  REMD,  each  replica  is  run  in  the  implicit  or  explicit 
solvent  regime,  and  the  energy  corresponding  to  each  replica  that 
is  used  to  calculate  exchange  probability  is  derived  from  force 
field  parameters  used  within  each  replica.  In  our  hybrid  replica 
exchange  method,  each  replica  runs  identically  to  the  explicit 
solvent  REMD.  During  an  exchange  attempt  however,  all  water 
molecules  and  counterions  are  removed  from  a  given  replica,  and 
the  energy  of  the  replica  is  calculated  using  the  force  field  param¬ 
eters  from  implicit  solvent.  Once  an  exchange  attempt  is  com¬ 
pleted,  all  removed  molecules  are  inserted  back  at  the  positions 
and  velocities  prior  to  the  exchange  attempt,  and  the  replica 
continues  on.  The  number  of  replicas  needed  for  efficient  REMD 
is  a  function  of  the  distribution  of  energies  that  span  the  REMD 
temperature  range,  and  smaller  systems,  such  as  those  in  implicit 
solvent,  have  a  narrower  range  of  energies  over  that  temperature 
span.  By  sampling  in  explicit  solvent,  but  calculating  exchanges  in 
implicit  solvent,  we  are  able  to  run  an  explicit  solvent  REMD 
simulation  with  the  same  number  of  replicas  used  in  implicit 
solvent  REMD.  The  energy  calculation  for  explicit  is  shown  in 
eq  2,  and  compared  with  the  hybrid  and  implicit  in  eq  3: 

rj  _  t  TProt  -  prot  .  t  Tprot  —  solv  j-j-solv-solv  f^\ 

r- explicit  —  UMM  UMM  '  UMM  l7! 


nplicit 


=  u 


prot 


AUi 


•solv 


(3) 


We  implemented  a  “semi”-hybrid  method  based  as  outlined  by 
Okur  et  al.25  specifically  to  further  investigate  salt-bridge  stabi¬ 
lization  in  the  hybrid  REMD  method.  The  semihybrid  method  is 
identical  to  the  hybrid  method  described  above  except  that  a 
prespecified  number  of  water  molecules  closest  to  the  protein  are 
kept  during  the  energy  calculation  in  an  exchange  attempt.  This 
semihybrid  method  was  applied  to  the  zwitterionic  peptide 
system  and  the  nearest  200  water  molecules  were  kept  during 
exchange  calculations.  MMTSB  Perl  scripts  for  both  the  hybrid 
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Figure  1.  Energy  distributions  for  each  replica  in  (a)  implicit,  (b)  explicit, 
and  (c)  hybrid  methods,  as  well  as  (d)  the  explicit  method  with  the  same 
temperature  replicas  used  in  the  implicit  and  hybrid  methods.  The 
distribution  furthest  to  the  left  in  each  plot  corresponds  to  the  lowest 
temperature;  the  distribution  furthest  to  the  right  corresponds  to  the 
highest  temperature. 


and  semi-hybrid  REMD  methods  are  publicly  available  at  our 
website:  http://www.bhsai.org/downloads/hybrid_remd/. 

Evaluation  Metrics.  We  measured  peptide  structure  and 
ensemble  characteristics  according  to  a  variety  of  metrics.  We 
described  ensemble  behavior  largely  through  probability  distri¬ 
butions  of  composite  trajectories  at  298  K  and  potential  of  mean 
force  (PMF)  calculations  made  usin^  the  temperature-based- 
weighted  histogram  analysis  method31  (WHAM)  using  all  tem¬ 
perature  replicas.  All  measurements  and  PMF  calculations  were 
facilitated  by  a  modified  version  of  the  MMTSB  toolset  script 
rexanalysis.pl.  The  following  structural  measurements  were 


binned  in  the  WHAM  algorithm.  For  alanine  dipeptide,  the  (p 
and  ip  backbone  torsion  angles  were  used.  For  the  zwitterionic 
tetrapeptide,  radius  of  gyration  ( Rg )  and  the  salt-bridge  distance 
(rsb);  defined  as  the  distance  between  the  Cg  atom  of  Argl  and 
atom  of  Glu4,  were  used.  For  chignolin,  the  Ca  root  mean 
squared  distance  (rmsd)  to  the  first  conformer  in  the  NMR 
structure30  (PDB  code  1UAO),  following  optimal  superposition, 
was  used,  as  well  as  distance  between  the  N  atom  of  Glyl  and 
the  C  atom  of  GlylO.  We  calculated  melting  curves  based  on  a 
WFLAM  analysis  at  10  K  intervals  from  250  to  450  K.  Standard 
deviations,  shown  in  parentheses,  are  determined  from  running 
the  identical  analysis  over  the  trajectory  data  blocked  into  thirds. 

We  pooled  250  ps  snapshots  from  composite  intermediate 
temperature  trajectories  of  chignolin  from  all  three  REMD 
mehods  and  clustered  the  resulting  peptides  structures  using 
the  jclust  hierarchical  clustering  algorithm  using  the  MMTSB 
Tool  Set  script  cluster.pl.  We  limited  analysis  to  three  levels  of 
clustering.  The  top-level  clustering  identified  two  clusters,  one 
corresponding  to  the  folded  state  and  the  other  to  the  unfolded 
state.  The  bottom-level  cluster  centers  derived  from  the  folded 
and  unfolded  clusters  were  then  reclustered  using  a  fixed-radius 
algorithm  with  a  radius  of  0.5  A  and  3.0  A  Ca  rmsd  for  folded  and 
unfolded  clusters.  Clusters  representing  less  than  2  ns  (8  snapshots) 
of  simulation  time  for  at  least  one  method  were  thrown  out  of  the 
analysis.  Approximately  99%  of  snapshots  were  represented  by 
one  of  the  bottom-level  clusters.  Cluster  centers  are  represented 
by  the  snapshot  in  each  cluster  with  the  lowest  Ca  rmsd  to  that 
cluster’s  centroid  coordinates. 

■  RESULTS 

Computational  Efficiency.  The  primary  goal  of  the  hybrid 
REMD  method  is  to  reduce  the  number  of  replicas  needed  to  run 
in  explicit  solvent  while  maintaining  the  overall  accuracy  of 
explicit  solvent  MD.  The  number  of  replicas  needed  for  efficient 
exchanges  in  REMD  is  a  function  of  the  overlap  in  energy 
distributions  of  adjacent  replicas  in  a  given  temperature  range, 
the  larger  the  overlap,  the  higher  the  exchange  rate.8  Figure  1 
illustrates  the  energy  distribution  from  the  alanine  dipeptide 
simulation  for  implicit  solvent,  hybrid,  and  explicit  solvent 
REMD  methods,  as  well  as  the  explicit  method  only  the 
temperature  clients  used  in  the  implicit  and  hybrid  methods.  In 
the  implicit  solvent  and  hybrid  REMD  methods,  the  energies 
span  from  appoximately  —20  to  +10  kcal/mol  between  the  298 
and  500  K  temperatures.  For  the  explicit  solvent  method,  these 
energies  span  from  —7400  to  —5800  kcal/mol  over  that  same 
temperature  span.  The  slightly  lower  energy  distribution  of  the 
hybrid  method  compared  to  the  implicit  method  can  be  attrib¬ 
uted  to  the  use  of  a  Nose-Hoover  thermostat  for  a  solvated 
system  consisting  of  2098  atoms  in  hybrid  compared  to  a 
Langevin  thermostat  for  an  unsolvated  system  of  22  atoms  in 
implicit  solvent. 

The  results  show  that  the  hybrid  method  is  successful  in  repro¬ 
ducing  the  energy  distributions  of  the  implicit  solvent  method 
and  the  corresponding  exchange  rates,  while  running  each  replica 
in  explicit  solvent.  It  is  clear  that  running  four  replicas  in  explicit 
REMD  is  insufficient  for  adequate  exchanges  as  there  is  a  large 
gap  between  the  energy  distributions  of  adjacent  temperature 
replicas.  The  results  suggest  that  the  hybrid  method  can  run 
explicit  solvent  replicas  at  approximately  1  / 5th  of  the  computa¬ 
tional  cost  of  the  explicit  method  for  this  system.  These  results 
are  representative  of  all  the  systems  modeled  in  this  study. 


680 


dx.doi.org/10.1021/ct200529b  | J.  Chem.  Theory  Comput.  2012, 8, 677-687 


Journal  of  Chemical  Theory  and  Computation 


ARTICLE 


150 

100 
.  50 
1  0 
-50 

-100 

-150 


1 

-150  -100  -50  0  50  100  150 


Phi  (degrees) 


Phi  (degrees) 


Phi  (degrees) 


Figure  2.  PMF  with  regard  to  backbone  torsion  angles  (p  and  xp  for  (a)  implicit,  (b)  explicit,  and  (c)  hybrid  REMD  methods.  For  comparison,  the 
(d)  adiabatic  map  generated  using  the  same  force  field  parameters  as  in  implicit  method  for  all  values  of  cp  and  xp,  in  5°  bins,  confirms  that  all  major 
minima  are  sampled. 


Alanine  Dipeptide  Simulation.  Alanine  dipeptide  has  long 
been  used  as  a  model  system  for  studying  protein  backbone 
conformational  sampling  in  molecular  dynamics.19  A  number  of 
studies  have  highlighted  small  but  significant  differences  in  con¬ 
formational  sampling  based  on  the  solvent  model  used.  We  ran 
10  ns  simulations  starting  from  the  extended  peptide  conforma¬ 
tion  using  the  implicit,  hybrid,  and  explicit  REMD  methods  and 
collected  data  from  2  to  10  ns.  From  each  of  the  three  methods, 
we  computed  the  resulting  PMF  along  the  (p  and  xp  backbone 
torsion  angles  to  identify  any  systematic  differences  in  the  con¬ 
formational  free  energy  landscape  between  implicit  and  explicit 
methods  and  evaluate  the  ability  of  the  hybrid  method  to  accom¬ 
modate  these  differences  in  accurately  reproducing  the  results 
of  the  explicit  solvent  simulation.  The  three  PMF  diagrams  are 
illustrated  in  Figure  2. 

Overall,  the  free  energy  landscapes  are  similar  for  all  three 
methods  with  the  right-handed  a-helical  region  aR  (—65°,  —50°) 
as  the  most  favorable,  but  the  results  confirm  small  but  significant 
differences  between  the  implicit  solvent  and  explicit  solvent 
energy  landscape.  Specifically,  in  implicit  solvent,  the  PPII  region 
(—65°,  150°)  has  lower  energy  relative  to  the  /1-sheet  region 
(—150°,  150°),  and  a-helical  region  aj  (0°,  100°)  has  lower 


Table  2.  Distribution  of  Conformations  at  298  K  for  the 
Three  REMD  Methods  for  Alanine  Dipeptide 


conformation  (%)  [s.d.] 

method 

ai/aL 

/i/PPII 

Hr 

imp-REMD 

45.3  [3.0] 

32.8  [3.2] 

2.1  [0.5] 

hyb-REMD 

34.6  [2.5] 

33.8  [1.1] 

5.7  [1.9] 

exp-REMD 

36.4  [1.6] 

34.8  [0.5] 

4.5  [1.0] 

energy  compared  to  explicit  solvent.  Finally,  the  left-handed 
a-helical  region  aL  is  higher  energy  in  the  implicit  solvent.  The 
hybrid  method,  despite  using  the  implicit  solvent  model  during 
exchanges,  successfully  reproduces  the  explicit  solvent  energy 
landscape,  and  matches  the  explicit  solvent  results  with  regard  to 
these  discrepancies. 

The  overall  effect  of  the  differences  in  the  energy  landscape 
of  alanine  dipeptide  is  an  over-representation  of  a-helical 
conformations  in  the  298  K  ensemble.  Table  2  shows  the  con¬ 
formation  distribution  between  three  largest  conformational 
clusters,  left-handed  a-helical  (a!  and  aL  basins),  /3-sheet 
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(/3  and  PPII  basins),  and  right-handed  ct-helical  (ctR  basin),  and 
indicates  that  the  ensemble  at  298  K  generated  by  the  hybrid 
method  most  closely  matches  the  explicit  solvent  results.  The 
results  of  the  alanine  dipeptide  simulations  demonstrate  that  there 
are  systematic  differences  in  conformational  sampling  between  the 
implicit  and  explicit  solvent  models  and  that  the  explicit  solvent 
sampling  in  the  hybrid  approach  is  able  to  successfully  reproduce 
the  overall  free  energy  landscape  of  explicit  method,  despite  using 
an  implicit  solvent  model  for  energy  calculation  during  exchanges. 

Zwitterionic  Tetra-Peptide  Simulation.  A  number  of  studies 
have  identified  the  overstabilization  of  salt-bridges  as  a  significant 
discrepancy  between  implicit  and  explicit  solvent  simulations. 
We  used  the  three  REMD  methods  with  a  zwitterionic  tetrapeptide 
that  is  flanked  by  a  positively  charged  arginine  residue  and  a  nega¬ 
tively  charged  glutamate  residue.  We  ran  50  ns  simulations  start¬ 
ing  from  an  extended  peptide  conformation  for  each  the  three 
REMD  methods  and  collected  data  from  10  to  50  ns.  We  calcu¬ 
lated  the  PMF  for  each  simulation  along  the  salt-bridge  distance, 
defined  as  the  distance  between  Cg  atom  of  Arg  1  and  C&  atom  of 
Glu  4  (Figure  3)  and  compare  overall  percentage  of  conforma¬ 
tions  that  contain  a  salt -bridge  in  the  composite  298  K  trajectory 
for  each  method  in  Table  2. 

Figure  3  illustrates  the  PMF  with  respect  to  salt-bridge  dis¬ 
tance  for  all  three  methods,  normalized  to  the  height  of  the 
energy  barrier  at  5.0  A.  A  deep  energy  well  is  observed  at  a  dis¬ 
tance  of  approximately  3.5  to  4.0  A,  corresponding  to  the 
formation  of  the  salt-bridge.  The  relative  depths  of  this  minimum 
is  —2.0,  —3.1,  and  —3.2  kcal/mol  for  explicit,  hybrid,  and  implicit 
methods,  respectively,  indicating  that  both  the  hybrid  and 
implicit  REMD  methods  display  substantial  salt-bridge  stabiliza¬ 
tion  relative  to  the  explicit  REMD  simulation.  At  salt-bridge 
distances  greater  than  4.0  A,  both  the  explicit  and  hybrid  methods 


“““imp-REMD 
—  hyb-REMD 
—exp-REMD 


Figure  3.  PMF  at  298  K  with  respect  to  salt-bridge  distance  for  the 
implicit,  hybrid,  and  explicit  REMD  methods. 


show  two  small  minima  at  6.6  and  9.5  A  corresponding  to  single 
and  double-shelled  solvent  separation.  Figure  4  illustrates  repre¬ 
sentative  structures  from  the  300  K  composite  trajectory  from 
the  hybrid  method  for  the  salt-bridge  minimum,  as  well  as  at  each 
of  the  two  solvent  separated  minima.  These  minima  highlight  the 
structured  water  effects  of  explicitly  sampled  solvent  in  the  hybrid 
approach. 

We  compared  the  observed  probability  distribution  of  salt- 
bridge  formation  across  the  three  REMD  methods  in  Table  3.  As 
expected,  the  explicit  method  demonstrated  the  lowest  propen¬ 
sity  for  salt-bridge  formation,  at  58%.  The  implicit  method 
showed  substantially  higher  salt-bridge  propensity  at  80%,  and 
surprisingly,  the  hybrid  method  showed  the  highest  propensity 
for  salt-bridge  formation  at  90%.  This  trend  is  reflected  in  the 
earlier  PMFs  which  show  that  the  hybrid  method  has,  by  far,  the 
largest  difference  between  the  free  energy  of  the  salt-bridged 
conformations  and  the  rest  of  the  ensemble. 

Overall,  the  zwitterionic  tetrapeptide  results  suggest  that  while 
the  hybrid  method  is  able  to  reproduce  features  of  the  energy 
landscape  that  result  from  explicit  water  modeling,  such  as  solvent- 
separated  minima,  the  resulting  thermodynamics  can  be  pro¬ 
foundly  affected  by  the  implicit  GB  model  used  in  the  hybrid 
scheme.  We  implemented  a  ‘semi’  hybrid  approach  outlined  by 
Okur  et  al.2S  specifically  to  study  the  role  of  that  a  shell  of  explicit 
water  molecules  can  play  on  the  salt-bridge  stabilization  observed 
in  the  hybrid.  We  carried  out  simulation  with  the  same  length, 
system-size  and  number  of  replicas  as  in  the  hybrid  method  and 
calculate  PMFs  with  respect  to  both  salt-bridge  distance  (Supporting 
Information  Figure  l)  and  radius  of  gyration  (Supporting  In¬ 
formation  Figure  2).  The  results  show  that  the  inclusion  of  an 
explicit  water  shell  in  the  semihybrid  method  largely  eliminated 
the  salt-bridge  stabilization  observed  in  the  hybrid  method. 

Chignolin /3-Hairpin  Peptide  Simulation.  The  computation¬ 
ally  designed  10-residue  chignolin  peptide  forms  a  structurally 
defined  /3-hairpin  and  has  been  increasingly  used  to  study  peptide 
folding  and  thermodynamics  in  computational  simulations.  This 
system  contains  a  number  of  protein-like  features,  including  an 
antiparallel  /3-sheet  with  a  /3-turn,  a  salt-bridge,  and  a  hydrophobic 


Table  3.  Distribution  of  Salt-Bridge  Formation  at  298  K  for 
the  Three  REMD  Methods  for  Zwitterionic  Tetra-peptide 

method  salt-bridge  (%)  [s.d.] 


imp-REMD 

hyb-REMD 

exp-REMD 


80.1  [5.6] 

90.1  [2.1] 
57.6  [0.0] 


Figure  4.  Representative  structures  from  the  hyb-REMD  simulation  at  a  salt-bridge  distance  of  3.8,  6.6,  and  9.5  A 
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aromatic  stacking  interaction,  making  it  an  ideal  test  case  for 
comparing  solvent  methods  in  REMD. 

We  ran  100  ns  simulations,  starting  from  the  folded  structure, 
for  each  of  the  three  REMD  methods.  While  all  clients  started 
from  the  folded  conformation,  by  50  ns  each  method  had  reached 
an  equilibrium  between  the  folded  and  unfolded  states  and  all 
the  analysis  was  done  from  data  obtained  from  50  to  100  ns. 
We  calculated  PMFs  with  respect  to  the  C a  rmsd  to  the  folded 
structure,  the  salt-bridge  distance,  and  the  minimum  distance 
between  the  indole  and  phenol  rings  of  Trp2  and  Tyr8.  We  also 
extracted  snapshots  at  250  ps  intervals  and  carried  out  hierarchi¬ 
cal  clustering  to  identify  the  major  conformations  populated  by 
each  simulation  method  at  low  (250  K),  intermediate  (370  K), 
and  high  temperatures  (450  K).  A  summary  of  the  results  for 
chignolin  can  be  found  in  Table  4. 

Figure  5  illustrates  the  two-dimensional  PMF  with  respect  to 
salt-bridge  distance  and  C  a  rmsd  for  each  of  the  three  methods  at 
298  K.  These  PMFs  can  serve  as  a  “fingerprint”  that  describes  the 
energy  landscape  of  the  overall  conformational  ensemble  for 
each  method.  All  three  PMFs  show  a  deep  basin  corresponding 
with  the  folded  structure  with  an  rmsd  of  between  0.5  and  2.5  A. 
Likewise,  all  three  methods  show  a  broad,  shallow,  basin  corre¬ 
sponding  to  the  unfolded  state  with  rmsd  great  than  3.0  A.  The 

Table  4.  Summary  of  Structural  and  Thermodynamic  Prop¬ 
erties  for  Chignolin 


method 

folded  (%) 
[s.d.] 

salt-bridge  (%) 

[s.d.] 

AG  at  298  K 

(kcal/mol)  [s.d.] 

Tm  (K) 

imp-REMD 

99.7  [0.2] 

97.8  [1.1] 

-2.8  [0.3] 

375 

hyb-REMD 

94.0  [1.4] 

93.7  [3.2] 

-1.6  [0.1] 

345 

exp -REMD 

75.0  [4.8] 

90.3  [3.5] 

-0.6  [0.1] 

384 

explicit  method  has  significantly  lower  free  energies  in  the 
unfolded  basin  compared  to  implicit  method,  with  the  hybrid 
method  falling  in  between.  Finally,  both  the  explicit  and  hybrid 
methods  show  alternate  minima  corresponding  to  folded  struc¬ 
tures  which  lack  a  salt-bridge,  and  semifolded  structures,  with 
RMSDs  between  2.5  and  3.5  A  that  are  largely  absent  in  implicit 
simulation. 

Using  WHAM  analysis,  we  calculated  a  theoretical  melting 
curve  (Figure  6)  by  generating  a  PMF  with  respect  to  rmsd  at 
10  K  intervals  from  250  to  450  K.  We  defined  a  folded  conforma¬ 
tion  as  one  with  a  Ca  rmsd  of  less  than  2.5  A,  and  calculated  the 
fraction  folded  based  on  the  free  energy  profiles  at  each  tem¬ 
perature.  The  explicit  method  shows  an  atypical  melting  curve 
that  is  broad  and  relatively  flat,  with  a  melting  temperature  (Tm) 
of  384  K.  This  is  in  sharp  contrast  to  the  experimentally  deter¬ 
mined  melting  curve30  but  consistent  with  previous  explicit  sol¬ 
vent  REMD  simulations  of  chignolin.13  The  hybrid  and  implicit 
melting  curves  display  a  more  typical  sigmoidal  form  with  Tm 
values  of  375  and  345  K,  respectively.  Surprisingly,  while  the 
hybrid  method’s  melting  curve  is  a  similar  shape  as  the  implicit 
method  melting  curve,  it  has  a  significantly  lower  Tm  than  both 
the  implicit  and  explicit  methods.  The  calculated  folding  free 
energy  at  298  K  was  —0.6,  —1.6,  and  —2.8  kcal/mol  for  explicit, 
hybrid,  and  implicit  methods,  respectively,  with  the  hybrid  meth¬ 
od  falling  in  between  the  implicit  and  explicit  simulation  results. 

We  calculated  two-dimensional  PMFs  at  the  theoretical  Tms 
for  each  method  (Figure  5)  to  observe  the  conformational 
landscape  where  the  folded  and  unfolded  populations  are  equal. 
The  overall  features  of  the  energy  landscape  are  largely  similar  to 
the  PMF  calculated  at  298  K,  but  the  relative  depths  of  the  energy 
basins  differ.  At  the  melting  temperature,  the  hybrid  method  is 
strikingly  similar  to  the  explicit  solvent  method,  with  a  substantial 
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Figure  5.  PMF  of  Ca  rmsd  and  salt-bridge  distance  for  explicit,  hybrid,  and  implicit  REMD  methods  at  both  298  K  (a,  b,  and  c,  respectively)  and  at  each 
method’s  melting  temperatures  (d,  e,  and  f,  respectively). 
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population  of  semifolded  states.  By  contrast,  even  at  the  melting 
temperature,  the  implicit  method  is  almost  devoid  of  these  semi¬ 
folded  conformations,  with  a  continuous  high-energy  barrier  be¬ 
tween  the  folded  and  unfolded  states. 

To  understand  this  phenomenon,  we  identified  the  conforma¬ 
tions  that  make  up  these  folded,  semifolded  and  unfolded  popu¬ 
lations  by  performing  a  clustering  analysis.  We  selected  trajec¬ 
tories  for  each  simulation  method  from  the  temperature  window 
that  was  closest  to  the  respective  Tm  values,  which  were  369  K  for 


^^“imp-REMD 

hyb-REMD 

exp-REMD 


Figure  6.  Simulated  melting  curves  from  the  implicit,  hybrid,  and 
explicit  REMD  methods. 


the  implicit  and  hybrid  methods  and  372  K  for  the  explicit 
simulation,  and  collected  snapshots  at  250  ps  intervals.  We 
pooled  these  structures  and  carried  out  hierarchical  clustering 
based  on  pairwise  heavy-atom  RMSDs.  Figure  7  summarizes  the 
clustering  analysis,  illustrating  conformations  at  the  center  of 
each  cluster,  and  their  respective  C„  rmsd  to  the  folded  structure. 
The  top-level  clustering  resulted  in  a  folded  and  unfolded  cluster 
(hereafter  referred  to  as  parent  clusters).  The  bottom-level  clus¬ 
ter  centers  from  these  parent  clusters  were  then  reclustered  using 
a  fixed  radius  clustering  algorithm  (see  Methods).  Our  analysis 
identified  three  folded  clusters,  FI,  F2,  and  F3,  whose  cluster 
centers  had  native  Ca  RMSDs  of  1.0  A,  1.1  A,  and  1.4  A,  respec¬ 
tively.  Satoh  et  al.15  identified  three  hydrogen  bonds  that  define 
the  folded  state:  one  between  Asp3:0  and  Gly7:N,  one  between 
Asp3:N  and  Thr8:0,  and  one  between  the  side-chain  carboxyl 
group  of  Asp3  (Asp3:Od)  and  Glu5:N. 

We  identified  four  unfolded  clusters,  termed  Ul,  U2,  U3,  and 
U4,  with  cluster  centers  that  have  RMSDs  of  2.6,  4.2,  4.1,  and 
6.1  A,  respectively.  Clusters  Ul  and  U2  could  be  termed  “semi¬ 
folded”  clusters  as  they  contain  native  hydrogen  bonds.  50%  of 
conformations  in  cluster  Ul  contain  the  Asp3:0-Gly7:N  hydrogen 
bond,  and  16%  contain  both  the  Asp3:0-Gly7:N  andAsp:N-Thr8:0 


Figure  7.  Structures  of  each  cluster  center,  Ca  rmsd  to  the  folded  structure,  and  relative  population  among  folded  (top)  and  unfolded  (bottom)  species 
for  each  method.  Native  hydrogen  bonds  are  shown  in  black,  aromatic  residues  Tyr2  and  Trp9  are  shown  as  spheres. 
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Table  5.  Summary  of  Average  Structural  Properties  and  Hydrogen  Bonding  in  Each  Cluster  for  Chignolin 


average  properties  (S.D.) 

native  hydrogen  bonds  present  (%) 

cluster 

rmsdfl  (A)  [s.d.] 

Jig  (A)  [s.d.] 

Asp3:0— Gly7:N 

Asp3:N— Thr8:0 

Asp3:0<5— Glu5:N 

FI 

1.78  [0.30] 

5.00  [0.09] 

99 

87 

47 

F2 

2.10  [0.40] 

5.00  [0.12] 

93 

68 

50 

F3 

2.25  [0.53] 

5.00  [0.15] 

98 

94 

44 

U1 

4.18  [0.51] 

5.33  [0.41] 

50 

16 

39 

U2 

5.61  [0.52] 

5.41  [0.44] 

81 

0 

17 

U3 

5.51  [0.77] 

6.22  [0.49] 

0 

0 

0 

U4 

6.71  [0.83] 

6.88  [0.84] 

0 

0 

0 

a  Heavy- atom  rmsd. 

hydrogen  bonds.  Likewise,  almost  80%  of  conformations  in 
cluster  U2  contain  the  Asp3:0-Gly7:N  hydrogen  bond.  Clusters 
U3  and  U4  lack  any  native  hydrogen  bonds,  and  conformations 
in  cluster  U4  were  less  compact  than  U3,  with  average  radii  of 
gyration  of  6.9  A  (s.d.  0.8  A)  compared  to  6.2  A  (s.d.  0.5  A). 
Table  5  summarizes  structural  properties  of  conformations  within 
each  cluster. 

In  addition  to  clustering  of  the  pooled  snapshots  of  all  three 
methods,  we  analyzed  the  composition  of  each  cluster  with 
respect  to  the  method  that  each  snapshot  was  derived  from. 
Figure  7  shows  the  relative  population  of  each  subcluster  with 
respect  to  parent  cluster.  Among  folded  parent  cluster,  all  three 
methods  show  comparable  composition  within  each  cluster. 
Among  the  unfolded  parent  cluster,  there  are  significant  differences 
between  the  implicit  method  and  the  hybrid  and  explicit  methods. 
In  particular,  12%  and  11%  of  the  unfolded  cluster  members  from 
hybrid  and  explicit  methods  fall  into  the  semifolded  U1  cluster, 
compared  to  0%  of  the  implicit.  Likewise,  52%  of  unfolded  cluster 
members  from  the  implicit  method  fell  into  the  unfolded  cluster 
U3,  compared  to  32%  and  22%  from  hybrid  and  explicit  methods, 
respectively.  These  results  suggest  that  the  implicit  method,  sig¬ 
nificantly  favors  the  unfolded  state  relative  to  the  semifolded  states, 
compared  to  the  explicit  and  hybrid  methods. 

■  DISCUSSION 

Continuum  solvation  models  such  as  generalized  Bom  have 
become  increasingly  popular  as  ways  to  make  larger  biomolecular 
systems  tractable  for  computational  modeling,  particularly  in  REMD 
methods.  However,  there  have  been  a  number  of  reported  dif¬ 
ferences  between  the  free  energy  landscape  of  such  implicit 
solvent  models  and  the  explicit  solvent  simulations  they  are 
designed  to  replicate.  Here  we  present  a  hybrid  REMD  method 
that  seeks  to  combine  the  computational  efficiency  of  implicit 
solvent  models  while  maintaining  the  accuracy  of  explicit  solvent 
simulations.  Toward  that  end,  we  model  a  number  of  peptide 
systems,  identify  discrepancies  in  the  free  energy  landscape  based 
on  a  number  of  metrics  from  backbone  torsion  angles,  to  salt- 
bridge  distance,  and  folding  stability,  and  evaluate  the  hybrid 
REMD  method  to  reproduce  explicit  solvent  simulation  results. 

As  has  been  observed  previously,  the  acceptance  rate  of 
REMD  simulations  is  a  function  of  the  system  size.  Our  hybrid 
REMD  approach  successfully  reduces  the  number  of  clients  nec¬ 
essary  to  run  an  explicit  solvent  simulation  to  that  of  an  implicit 
solvent  simulation  and  maintains  the  overall  replica  exchange 
acceptances  rate  of  the  implicit  solvent  model.  This  is  achieved  by 


reducing  the  system  size  associated  with  Metropolis  exchanges  to 
that  of  an  implicit  solvent  system.  Ultimately,  the  utility  of  the 
hybrid  approach  rests  on  whether  this  reduction  in  clients  comes 
at  the  cost  of  decreased  accuracy  compared  to  explicit  solvent 
simulations  or  experimental  results. 

In  the  smallest  model  system,  alanine  dipeptide,  we  observed 
small  but  significant  differences  in  the  energy  landscape  between 
implicit  and  explicit  solvent  simulations  with  regards  to  the  (p  and 
( p  torsion  angles.  Overall,  the  implicit  solvent  model  tended  to 
overstabilize  a-helical  conformations  relative  to  /3-strand  and 
PPII  conformations,  a  result  that  has  been  observed  previously.18 
Our  hybrid  approach  successfully  reproduced  the  energy  land¬ 
scape  of  the  explicit  solvent  system  with  the  relative  energies  at 
each  minima,  and  population  distributions  within  the  margin  of 
error  compared  to  the  explicit  solvent  simulations.  Although  this 
represents  only  a  modest  improvement  in  accuracy  for  this 
simple  system,  the  cumulative  sampling  effects  of  such  differ¬ 
ences  in  larger,  more  complex  biomolecular  systems  could  lead  to 
significant  differences  in  the  energy  landscape. 

One  of  the  key  reported  differences  between  GB-based  impli¬ 
cit  solvent  models  and  explicit  solvent  simulations  is  the  over¬ 
stabilization  of  salt-bridges.  Using  the  zwitterionic  tetra-peptide 
Ace-RAAE-Nme,  we  sought  to  identify  the  degree  to  which  this 
overstabilization  occurs  in  the  GBMV2  implicit  solvation  model 
and  evaluate  the  hybrid  method’s  capabilities  in  overcoming  this 
deficiency.  As  expected  the  implicit  method  showed  a  signifi¬ 
cant  overstabilization  of  the  salt-bridge  and  lacked  the  distinct 
solvent-separated  minima  that  are  present  in  the  explicit  method 
results.  The  hybrid  method  showed  significant  overstabilization 
of  the  salt-bridge,  comparable  to  that  of  the  implicit  method,  but 
was  able  to  recover  solvent-separated  minima  found  in  the 
explicit  solvent  PMF.  These  results  suggest  that  while  the  hybrid 
method  is  able  to  recover  features  of  the  explicit  solvent  energy 
landscape,  the  GB  solvent  model  used  to  calculate  exchanges 
can  still  have  a  large  effect  on  the  resulting  thermodynamics. 
Okur  et  al.20  reported  similar  mixed  results  with  their  implementa¬ 
tion  of  a  hybrid  solvation  method,  observing  that  a  thorough 
analysis  of  salt-bridge  overstabilization  was  confounded  by  sys¬ 
tematic  differences  in  backbone  sampling  in  unrestrained  implicit 
and  explicit  solvent  REMD  simulations. 

We  used  the  ten  residue  chignolin  peptide  as  a  model  system 
because  it  contains  protein-like  structure  features  including  an 
antiparallel  /3-sheet,  a  /3-hairpin,  a  salt-bridge,  and  an  hydropho¬ 
bic  packing  interaction  between  two  aromatic  residues.  We 
started  each  REMD  simulation  from  the  folded  state  and  sought 
to  reproduce  explicit  solvent  structural  and  thermodynamic 
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properties  of  the  system.  Overall,  the  hybrid  method  reproduced 
a  number  of  features  of  the  explicit  solvent  energy  landscape 
compared  to  the  implicit  solvent  simulation,  particularly  with 
regards  to  semifolded  conformations.  Likewise,  the  calculated 
folding  free  energy  from  the  hybrid  method  was  closer  to  explicit 
solvent  results  and  experiment,  while  the  implicit  solvent  simula¬ 
tion  resulted  in  a  significantly  overstabilized  peptide.  This  could 
be  due  to  the  presence  of  a  salt-bridge  between  the  N  and 
C-termini  of  the  peptide  in  all  three  simulation  methods  which  is 
largely  unobserved  in  the  NMR  structure.30 

A  comparison  of  the  simulated  melting  curves  from  each 
method  for  chignolin  showed  disparate  results;  the  hybrid  and 
implicit  methods  showed  a  sigmoidal  curve,  while  the  explicit 
method  showed  a  flat  curve,  relatively  insensitive  to  temperature. 
The  unusual  explicit  solvent  melting  curve  behavior,  which  has 
been  observed  previously,  diverges  significantly  from  the  experi¬ 
mental  melting  curve30  and  suggests  potential  deficiencies  in 
TIP3P  water  as  a  model  for  high-temperature  water  molecules.  It 
is  important  to  note  that  while  explicit  solvent  REMD  inherently 
accounts  for  some  of  the  differences  in  the  free  energy  of  solva¬ 
tion  at  higher  temperatures,  the  solvation  free  energies  calculated 
using  the  GB-solvation  model  are  temperature-invariant.  As  in 
the  zwitterionic  tetrapeptide  system,  the  GB  solvent  model  ap¬ 
pears  to  have  a  significant  effect  on  the  thermodynamics  of  the 
hybrid  system,  influencing  the  path  through  temperature  space 
that  its  explicit  solvent  trajectory  travels,  and  ultimately  affecting 
the  composition  and  behavior  of  the  room-temperature  ensemble. 

Finally,  we  sought  to  characterize  the  structural  ensembles 
sampled  through  the  three  simulation  methods  for  chignolin  to 
see  if  any  benefits  arise  from  the  explicit  solvent  sampling  of 
the  hybrid  method  versus  implicit  solvent.  Specifically,  we  were 
interested  in  conformations  that  reflected  transitions  between 
the  folded  and  unfolded  states.  We  pooled  snapshots  from  all 
three  methods  taken  from  composite  trajectories  at  the  respec¬ 
tive  melting  temperatures  and  employed  a  hierarchical  clustering 
strategy  to  characterize  the  resulting  ensembles.  We  chose  a  two- 
stage  clustering  scheme  that  separately  treated  the  folded  and 
unfolded  clusters  and  identified  several  folded  and  unfolded 
cluster  centers.  All  three  methods  accessed  the  folded  clusters 
with  comparable  frequency.  Among  the  unfolded  clusters,  we 
identified  two  semifolded  clusters  that  contained  partial  features 
of  the  folded  peptide,  including  critical  backbone  hydrogen 
bonds  that  stabilize  the  /3-hairpin.  We  found  that  the  hybrid 
method  reproduces  the  explicit  solvent  results  in  populating  the 
semifolded  cluster  Ul,  while  the  implicit  solvent  fails  to  populate 
it  to  any  significant  degree.  This  semifolded  cluster  is  highly  sim¬ 
ilar  to  a  semifolded  structure  observed  by  Mu  et  al.  in  an  explicit 
REMD  simulation  of  chignolin,22  where  it  was  deemed  critical 
to  the  chignolin  folding  pathway.  These  results  underscore  the 
value  of  sampling  in  an  explicit  solvent  regime  in  the  hybrid 
method  to  reproduce  ensembles  generated  in  the  computation¬ 
ally  much  more  expensive  explicit  solvent  simulations. 

■  CONCLUSION 

The  hybrid  approach  succeeded  in  sampling  explicit  solvent 
simulations  at  a  fraction  of  the  computational  cost.  To  the  degree 
where  implicit  and  explicit  solvent  simulations  show  different 
behavior,  the  hybrid  method  generally  gives  mixed  results  with 
respect  to  thermodynamic  and  structural  properties  of  the  room- 
temperature  ensemble.  However,  its  explicit  modeling  of  solvent 
molecules  recovers  solvent-specific  features  of  energy  landscapes 


such  as  solvent-separated  minima,  and  more  importantly,  ap¬ 
pears  to  reproduce  the  structural  sampling  of  explicit  solvent 
simulations.  In  protein  folding  and  structural  refinement,  implicit 
solvent  simulations  often  fail  to  sample  near-native  conforma¬ 
tions  and  explicit-solvent  may  provide  folding  pathways  that 
implicit  solvent  cannot  sample.  The  hybrid  REMD  approach  pro¬ 
vides  a  means  for  combining  the  enhanced  sampling  of  REMD 
with  explicit-solvent  modeling  at  a  fraction  of  the  computational 
cost  traditional  methods,  allowing  for  simulations  with  larger 
system  sizes  and  longer  time  scales. 
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